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Highlights 
1. A parallel computing method is implemented in Geant4. 
2. The method is used for X-ray imaging reconstruction. 


3. Performance is verified using a complex model. 
Abstract 


Accurate 3-Dimensional (3-D) reconstruction technology for non-destructive testing based on digital 
radiography (DR) is of great importance for alleviating the drawbacks of the existing computed tomography 
(CT)-based method. The commonly used Monte Carlo simulation method ensures well-performing imaging 
results for DR. However, for 3-D reconstruction, it is limited by its high time consumption. To solve this 
problem, this study proposes a parallel computing method to accelerate Monte Carlo simulation for projection 
images with a parallel interface and a specific DR application. The images are utilized for 3-D reconstruction 
of the test model. We verify the accuracy of parallel computing for DR and evaluate the performance of two 
parallel computing modes—multithreaded applications (G4-MT) and message-passing interfaces (G4-MPI)— 
by assessing parallel speedup and efficiency. This study explores the scalability of the hybrid G4-MPI and 
G4-MT modes. The results show that the two parallel computing modes can significantly reduce the Monte 
Carlo simulation time because the parallel speedup increment of Monte Carlo simulations can be considered 
linear growth, and the parallel efficiency is maintained at a high level. The hybrid mode has strong scalability, 
as the overall run time of the 180 simulations using 320 threads is 15.35 h with 10 billion particles emitted, 
and the parallel speedup can be up to 151.36. The 3-D reconstruction of the model is achieved based on the 
filtered back projection (FBP) algorithm using 180 projection images obtained with the hybrid G4-MPI and 
G4-MT. The quality of the reconstructed sliced images is satisfactory because the images can reflect the 
internal structure of the test model. This method is applied to a complex model, and the quality of the 


reconstructed images is evaluated. 
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1. Introduction 


Non-destructive detection technology can obtain details about defects inside an object without damaging its 
original structure and properties, indicating high efficiency for object quality testing. Among these, X-ray- 
based testing is a popular and effective method for the inspection of industrial products. In recent decades, 
digital radiography (DR) [1] and computed tomography (CT) [2] have been playing important roles in non- 
destructive testing techniques for industrial and medical examinations. As the energy and penetration ability 


of X-ray sources increase, DR and CT can detect high-density industrial components and display the location, 


orientation, shape, and size of defects in workpieces. Both provide highly accurate visualization solutions for 
detecting the internal structures of objects. As CT is typically used to produce images of each slice layer of a 
target object, it can reconstruct a complete 3-D structure, thereby accurately reflecting the internal structure 
information of the object [3]. In contrast, DR is limited to producing 2- Dimensional (2-D) images; therefore, 
the detection efficiency is lower for more hidden and overlapping objects [4]. However, CT radiation is high 
and the equipment is large, inflexible, and expensive. In contrast, DR radiation is relatively low, the equipment 
is easy to operate with a shorter imaging time, and it is relatively cheaper. Therefore, the use of DR to obtain 
3-D reconstructed images of an object is of great importance. Chen et al. applied DR technology to analyze 
the 3-D position of defects in gas turbines, and the relative positioning error was less than 2% [5]. Susanto et 
al. applied a DR system that is relatively cheaper than the CT system for the 3-D reconstruction of an 
aluminum step wedge cylinder using the filter back projection (FBP) method; the resulting image quality was 
satisfactory, as it was easily interpreted by the naked eye [5]. Staub et al. developed an algorithm for computing 
digital reconstructed radiographs (DDRs) to match the real cone-beam CT without artificial adjustments, and 
the algorithm produced DDRs in approximately 0.35 s for full detector and CT resolution [6]. Recent studies 
[8][9][10][11] have contributed to the development of X-ray imaging. 


The DR system is primarily composed of an X-ray source tube, test object, and detector. After the tube emits 
X-ray particles, the particles enter the object and interact with the material through photoelectric absorption, 
Compton scattering, or Rayleigh scattering, such that some of the particles are absorbed or scattered. The 
attenuated particles are captured by the detector crystal and converted into current signals. Subsequently, the 
signals are transformed into grey images with different contrasts. These images are referred to as projection 
images in this paper. The pixel value of the projected images is relative to the intensity decay of the X-rays 
along the path. The X-ray intensity decay equation based on numerical calculations of determinism is as 
follows [12]: 


I= Ibe 4 (1) 


where I represents the intensity of the incident particles in the detector, Iọ represents the intensity from the 
tube, and u and x are the linear decay coefficient and thickness of the medium interacting with the X-ray, 
respectively. For DR, applying Eq. (1) for an object with a simple structure is sufficient, but for a more 
complicated model, this method has quite a severe error because of the scattering of particles or other cases 
[13]. To obtain images with well-performing quality, applying the Monte Carlo simulation method is an 
efficient choice. This is a calculation method based on probability and statistical theory. Souza et al. introduced 
a methodology for DR simulation based on Monte Carlo simulation. They compared simulated and 
experimental images of a steel pipe containing corrosion defects and observed that the two images had good 


consistency [14]. 


However, a limitation of the Monte Carlo simulation is its significant time consumption, particularly in the 3- 
D reconstruction of objects [15][16]. Because reconstruction requires numerous projection images from 
different viewing angles of the object, Monte Carlo simulations must be performed several times. With the 
rapid development of high-performance computing (HPC) in recent years, large-scale parallel computing has 
effectively solved time-consuming and computationally resource-intensive problems. Parallel computing on 
HPC platforms is usually based on both message passing interface (MPI) and shared memory parallel 
programming (OpenMP) approaches for resource planning and memory sharing, to ensure scalability and 


accuracy [17]. The Monte Carlo simulation toolkit Geant4 provides two parallel computing modes: an MPI 


(G4-MPI) and a multithreaded application (G4-MT) for accelerating the Monte Carlo simulation [18]. Wang 
et al. applied a Geant4-based parallel computing method to nuclear logging and evaluated the performances 
of the two modes. The results show that the computing process combining G4-MT and G4-MPI execution can 
significantly reduce the runtime of the Monte Carlo simulations [19]. In addition, studies [20][21][22] have 


demonstrated the great ability of the Geant4 toolkit for X-ray imaging simulation. 


This study proposes a parallel computing method based on two parallel computing modes, G4-MPI and G4- 
MT, for the simulation of DR to obtain projection images for 3-D reconstruction. First, the accuracy of parallel 
computing was validated. Second, the performances of the two modes were evaluated, followed by an 
evaluation of the scalability of the hybrid G4-MPI and G4-MT. After reconstructing the 2-D slicing images, 
the quality of the images was measured. Subsequently, a 3-D view image of the model was reconstructed. 
Finally, the method was applied to a more complex model for validation, and the quality of the reconstructed 


images was measured. 


2. DR system based on Geant4 


Geant4 is a Monte Carlo application developed mainly by the European Organization for Nuclear Research 
(CERN) based on C++ object-oriented technology. Owing to the realistic nature of its simulation, it can 
provide users with simulation results that do not differ significantly from real experimental results, making it 
easy for users to evaluate and modify the simulation experiments. The composition of the DR system and the 
projection image with a projection angle of 0° are presented in Fig. 1, whereas the test model parameters are 
listed in Table 1. The test model was a large cube with two small cube defects and a cylindrical defect. Three 
defects were filled with air, whereas the remainder were made of aluminum. The two small cube defects were 
centrally symmetric about the center of the model. This symmetrical regularity structure ensures a good 


hierarchy of the reconstructed sliced images for analyzing the reconstruction results. 


Table 1 Parameters of the test model 


Large cube Cylindrical defect 1 Small cube defect 2 Small cube defect 3 
Side length 4cm / 1.5 cm 1.5 cm 
Height / 4cm / / 
Radius / 0.3 cm / / 
Material Aluminum Air Air Air 


In the Geant4 simulation model, X-ray particles with an energy of 180 keV were emitted by the X-ray source 
in a cone beam. The direct flat panel detector consisted of Cs/ crystal, a-Si:H thin-film transistors (TFT), and 
glass substrate. The detector crystal can convert X-ray into visible light according to the incident intensity of 
the X-ray, while the TFT array stores the relevant signal to form a pixel matrix [12]. In this study, the size of 
the detector was 6cm x 6cm x 1.5cm and the TFT array had a pixel sampling interval of 200 um. Therefore, 
the modeled detector pixel matrix was set to 300 x 300, and thus, projection images with a pixel size of 300 


columns x 300 rows could be obtained. 
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Fig. 1. Projection image of 0° and three components of DR system 


3. Implementation of parallel computing function based on Geant4 


3.1 Basic principle of parallel computing 


The primary concept of parallel computing is to divide a large task into subtasks for some processors and 
speed up the calculation; each processor works in concert with the others to conduct subtasks in parallel. MPI 
and OpenMP are primarily used in parallel computing for HPC to facilitate internode information delivery, 


resource allocation, and memory sharing. 


Scalability, which refers to a system’s ability to expand in response to future changes in requirements, is an 
important metric for evaluating parallel computing methods in HPC. Specifically, in this study, satisfactory 
scalability means that the parallel computing method is less time-consuming and ensures favorable efficiency 
when more projection images with more emitting particles are required. This study assessed the scalability 
and performance of the parallel computing mode by considering parallel speed-up and efficiency [22][24]. 
Speedup is defined as the ratio of the single-thread execution time to the execution time when one or more 


nodes with multiple threads are used. Parallel speedup, denoted as Sn, can be expressed as 
Sn = Tp (2) 


where Tı and Tn are the run time using one and n threads, respectively. Parallel efficiency, denoted as En, is 


defined as the average utilization of the n allocated threads and it can be expressed as 
S. 


where n is the number of threads and Sn refers to the speedup with n threads. 
3.2 Implementation of parallel computing 


Parallel computing techniques have been widely used since the release of Geant4 (version 10.0). G4-MPI 
cross-node scaling and G4-MT cross-thread scaling are used to realize hybrid parallel computing supported 


by data collection methods such as tuples, scorers, and histograms. A master node and several slave nodes are 


= 
© 
© 
=) 


C] 


202306.0 


y= 
a 


IV 


aam 


chinaX 


used to implement the computing process. Each slave node is connected to several threads that cooperate to 
output the preliminary computing results, and once the results from each node are merged, they form the final 
output, which is sent to the master node. In this study, the final output is the energy deposition collected from 


the histograms. 


In this study, a parallel interface and DR application were developed. The DR application in this study included 
a DR system and function-calling scripts that deployed parallel interfaces. In addition to incorporating the X- 
ray tube, flat panel detector, and test model in the geometric modeling, Monte Carlo simulation parameters 
such as particle definition, physical process, and data collection were specified by the imaging system to 
accurately simulate X-ray interactions with the imaged object. The parallel interface connected the parallel 
computing functions to a DR system by transmitting data to different nodes and threads. Relevant scripts were 
developed to construct the interface. A combination of nodes and threads was defined for the submission of 
the parallel input script, then the energy deposition was collected in the automated merging of the output data 


script. 


Fig. 2 shows the application of Geant4-based parallel computing to the simulation of a DR system. The blue 


> boxes represent the DR system and parallel interface, whereas the yellow box displays the 3-D reconstruction 


process, all of which were developed by the methods described in this article. The red box represents the 
Geant4 kernel and the purple box represents the HPC developed by Chengdu HPC. The script extracts of the 
parallel input submission, energy deposition results collection, grayscale transform, and 3-D reconstruction 


are shown in Fig. 2 below. 


By employing the interface and application, the simulation of the DR was enabled by G4-MPI and G4-MT, 


which form the basis of the following sections. 
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Fig.2 Overview of Geant4-based parallel computing applied in DR. Red pointer line: the DR system’s parameters and scripts 


deploying the parallel interface are settled before the parallel computing. Pink pointer lines: the mapping relationship of parallel 


computing modes between the Geant4 kernel and HPC. Green pointer lines: the overall system operation flow which starts from 
the parallel interface deploying scripts and ends at the reconstructed 3-D image generation. The numbers in green indicate the 


sequences of main calculation and processing. 


Script extract 1 Parallel Input Submission 
//mpirun.sh instruction is used submit parallel input files. 
// P:job partition, n:number of nodes, t: threads utilized in each node, TN: total threads 
#SBATCH -P p #SBATCH -N n 
#SBATCH —ntasks-per-node t 
module load mpi/openmpi/4.0.2/gcc-7.3.1 
module load apps/Geant4.10.06.p02/openmpi-4.0.2-gcc-7.3.1 


mpirun -np TN ./parallel_computing run.mac 


Script extract 2 Energy Deposition Results Collection 
for ( itr = evtMap -> GetMap() -> begin(); itr != evtMap -> GetMap () -> end() ; itr++ ) 
{ 
fedep1 = *( itr -> second )/ keV; 
a[copyNb] = a [copyNb] + 1; 
b[copyNb]=b[copyNb]+fedep1; 
} 


Script extract 3 Grayscale Transform and 3-D Reconstruction 
//Two lines below are extract of core code for turning energy deposition into projection image 
readbook = np.array ( data_arrayl ) . reshape (( 300,300 )) 
readbook1 = (255 * ( readbook - min! )/( max1 - min1 )) . astype( np.int16 ) 
//Four lines below are for turning projection images into slicing images. 
proj_fft = my_fft (R, width) 
proj_filtered [:,i] = np.multiply ( proj_fft [:,1] , R_L_filter ) proj_ifft = ( my_ifft ( proj_filtered )) . real 
fbp [x , y] = fbp [x , y] + proj_ifft [t , i] 


4. Parallel performance evaluation of G4-MPI and G4-MT 


This study utilized Chengdu HPC as the computational platform. In this HPC, each node was configured with 
a 32-core x86 processor with a main frequency of 2.5 GHz. The MPI interface adopts hpcx-2.4.1, and is 
compiled with gcc7.3.1, which is compatible with the G4-MPI and G4-MT environments. To evaluate the 
capabilities of G4-MPI and G4-MT for DR, we considered three aspects: the accuarcy of parallel computing 
for DR, performance of G4-MPI and G4-MT, and application of hybrid G4-MPI and G4-MT modes. 


4.1 Accuracy of parallel computing 


This section verifies the accuracy of parallel computing by comparing the projection images obtained using 
parallel and non-parallel computing models. In the parallel computing mode, ten nodes with full threads, which 


is one case of the hybrid G4-MPI and G4-MT modes, were used, whereas in the non-parallel computing mode, 


only one thread was used in the simulation. 10 billion particles with an energy of 180 keV were emitted to 
ensure the convergence of the Monte Carlo simulation. Peak Signal to Noise Ratio (PSNR) and Root Mean 
Square Error (RMSE) [25][26] were utilized to measure the correctness of parallel images with non-parallel 


images as the benchmark. The RMSE can be calculated as 


(4) 


RMSE (lp, Inp) = (Gap rs Ip) 
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where Ip and Inp are the images obtained in the parallel and non-parallel computing modes, respectively. The 
PNSR can be calculated as 


(MAX)? 
(RMSE)? 


where MAX; is the largest pixel value one image has. 


(a) Projection image of parallel mode. (b) Projection image of non-parallel mode. 
Fig. 3 Projection images of two computing modes 
Two images from two modes in the same projection angle are shown in Fig. 3, and it can be seen that the two 
images basically remain the same. The RMSE and PSNR between them are 3.83 and 36.5, respectively, 
indicating that the two images have a small difference in grayscale. This demonstrates that the results of the 
parallel computing mode based on Geant4 remain consistent with those of the non-parallel mode; thus, the 


following sections of the article can be presented. 
4.2 Performance evaluation of G4-MPI and G4-MT 


Six test groups were designed using different combinations of test nodes and threads for G4-MPI and G4- MT 
in Chengdu HPC. To ensure the accuracy and repeatability of the simulation results, each test was repeated 30 


times to obtain the population mean value ui and population standard deviation value 6: [27]. 


30 


1 
Hi = -y xQ) (6) 


(7) 


where xi(j) represents the j™® sample value (speedup or parallel efficiency) of test group i and n is the number 


of repetitions of each test. The confidence interval is calculated as in Eq. (8) [28]. 


interval(cy,C2) = Mj  Zaj2 X 5;/Vn (8) 


where æ is the significance level and cı and c2 are the upper and lower bounds of the interval, respectively. 
The critical value, Za2, is calculated according to the normal distribution table considering 99.7% as the 
confidence level for each test [29]. The detailed results are shown in Table 2 and Table 3. The line graphs of 
parallel speedup and efficiency for six tests are shown in Fig. 4. 


Fig. 4 shows that the speedup increases linearly with the number of nodes or threads, whereas the parallel 
efficiency tends to stabilize at a relatively high value as the number of nodes or threads increases. Furthermore, 
according to Table 2 and Table 3, with a 99.7% confidence level, the true values of the speedup and parallel 
efficiency of the six tests were within the confidence level for both G4-MT and G4-MPI. This demonstrates 
that both G4-MT and G4-MPI have good repeatability and efficiency in simulating the DR process. The 
speedup and efficiency of G4-MPI are marginally higher than those of G4-MT because G4-MPI adopts a 
shared memory mode, which improves computing by having each thread share information during Monte 


Carlo calculations. 


Table 2 Parallel performance of G4-MPI 


G4-MPI Speedup (Sn) Efficiency (En) 
combination of node and thread Hi Ôi Interval (c;,c2) Li(%) Ôi Interval (c7,c2) 
1x1 1 0.04 10.02 100 0.33 1000.16 
1x3 2.886 0.26 2.886+0.12 96.2 3.23 96.2 £1.52 
1x5 4.710 0.31 4.710+0.14 94.2 3.73 94.2+ 1.76 
1x7 6.729 0.37 6.729+40.17 96.1 3.13 96.1 £1.47 
1x10 9.424 0.43 9.424+0.20 94.2 3.75 94.2 £1.77 


Table 3 Parallel performance of G4-MT 


G4-MT Speedup (S,) Efficiency (En) 
combination of node and thread li Ôi Interval(c;, c2) Li(%) Ôi Interval(c;,c2) 
1x1 1 0.04 10.02 100 0.33 1000.16 
3x1 2.886 0.26 2.886+0.12 96.4 3.23 96.41.51 
5x1 4.895 0.31 4.710+0.15 97.9 3.73 97.91.18 
7x1 6.831 0.37 6.729+0.17 97.6 3.13 97.61.29 
10x1 9.194 0.43 9.424+0.20 91.9 3.75 91.91.75 
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Fig. 4 Parallel speedup and efficiency of G4-MPI and G4-MT 


4.3 Scalability evaluation of hybrid G4-MPI and G4-MT for DR 


In this section, ten tests for the DR simulation are designed to evaluate the scalability of the hybrid G4-MPI 
and G4-MT modes. The G4-MT in this section ran in the full-thread mode for each computational node with 
32 threads. The nodes ofthe 10 tests ranged from 1 to 10. Ten billion X-ray particles were emitted during each 
test. To illustrate the scalability evaluation, the runtime of one node using one thread was utilized as the 
benchmark time for 10 tests. The results of the parallel computing of the DR simulation are listed in Table 4, 


and the parallel speedup and efficiency of the ten tests are plotted in Fig. 5. 


Table 7 and Fig. 5 demonstrate that as the number of nodes increases, the speedup deviates from linear growth 
and the parallel efficiency decreases within a certain range. According to Amdahl’s Law [30], which describes 
how speedup increases theoretically when multiple processors are used in parallel computing, different nodes 
have parallel overhead time such as communication and waiting; therefore, when the number of nodes 
| increases, parallel speedup will deviate from ideality, which leads to a decrease in parallel efficiency. This is 
acceptable in this study because the speedup can reach up to 151.36 and the reduction in runtime is significant. 
For example, in the next section, 180 projection images were required for reconstruction. Based on the runtime 
required for one projection image using one thread, it can be estimated that 2,323.4 h are required. However, 
using G4-MPI in full threads, the process will only cost approximately 15.35 h in total. The strong scalability 
of G4-MPI in the full-thread mode for DR simulation was proven. 


Table 4 Scalability test of hybrid G4-MPI and G4-MT for DR simulations 


Combination of node and thread Run time (s) Parallel speedup (Sn) Parallel efficiency (En) 
1x1 46468 1 100% 
1x32 1607 28.91 90.36% 
2x32 864 53.78 84.04% 
3x32 623 74.58 77.69% 
4x32 514 90.40 70.63% 
5x32 450 103.26 64.38% 
6x32 410 113.33 59.03% 
7x32 381 121.96 54.45% 


8x32 354 131.26 51.28% 


9x32 330 140.81 48.89% 
10x32 307 151.36 47.35% 
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Fig. 5 Scalability evaluation of hybrid G4-MPI and G4-MT 


5. 3-D reconstruction of the model and quality analysis of image 


This section implements the DR simulation to obtain sufficient original projection images using parallel 
computing with G4-MPI in full threads and utilizes the projection images to reconstruct the test model in three 
dimensions to obtain information about the defects inside the model. The simulation was implemented 180 
times, and each time the test model was rotated by 1°, as shown in Fig. 6; thus, 180 2-D projection grayscale 
images 300 x 300 pixels in size were generated. During the simulation process, the X-ray generator emitted 
10 billion gamma particles with an energy of 180 keV in 10-node mode with 32 threads per node. The rotation 
operation of the test model was automatically implemented by a script developed in this study. 


Fig. 6 Rotational imaging schematic 


5.1 3-D reconstruction using filtered back projection algorithm 


The FBP algorithm is widely used. It is a spatial processing technique based on Fourier transform theory [31]. 
One of its characteristics is that the projection under each acquisition projection angle is convolved by specific 
filters before back-projection, such that the toroidal artifact caused by the point spread function can be reduced. 
This study adopted the FBP algorithm to obtain the 3-D reconstructed images of the model. The FBP process 


is described as follows. 


e Step 1: Obtain one 300 x 180 original projection matrix using the same rows of pixel information in 


all the original projection images. 
e Step 2: Pre-process the original projection matrix data for calibration. 
e Step 3: Conduct a one-dimensional Fourier transform for the result of Step 2. 
e Step 4: Perform convolution filtering on the result of Step 3. 
e Step 5: Conduct a one-dimensional inverse Fourier transform for the result of Step 4. 


e Step 6: Perform a direct inverse projection on the result of Step 5 for the 2-D reconstructed slicing 


e Step 7: Repeat Steps 1 to 6 300 times to produce 300 2-D slicing images. 
e Step 8: Utilize the 300 images to obtain 3-D reconstructed images. 


Step 1 considers the N“ row of the pixel information of each original projection image. N is initially equal to 
one and increases by one with each repetition. Thus, 300 original 300 x 180 projection matrices are generated 
after 300 repetitions. Three hundred 3-D reconstructed slicing images, each 300 x 300 pixels in size, are 
generated. 


As the toroidal artifact is one of the main factors affecting the quality of 3-D reconstructed images, this study 
adopted a projection data pre-calibration method combining polynomial fitting and the probability statistics 
correction method in Step 2 [32]. The range of alternative correction factors was determined by polynomial 
fitting of the projected data column-by-column, and correction factors were determined using the maximum 


probability principle. 


Among the eight steps above, the design of the filter in Step 4 has a significant impact on the final 
reconstruction results [33]. The selection of filters is actually the selection of window functions and to gain 
better-reconstructed image resolution, the inverse Fourier transform of the window functions should have a 
high and narrow central protrusion [34]. To obtain reconstructed images with better quality, this study used 
the three most commonly used filters: the Ram-Lak, Shepp-Logan, and Hamming filters to perform 
convolution filtering on the result of the one-dimensional Fourier transform. The 2-D reconstructed slicing 


image under 0° rotation is consistent with that shown in Fig. 1. 
5.2 Quality evaluation of reconstructed images 


To quantitatively measure the quality of the reconstructed images, three commonly used image quality 
assessment metrics, contrast-to-noise ratio (CNR) [35], average gradient (AG) [36], and image entropy (IE) 
[37], were used. CNR is defined as the ratio of peak signal intensity to background intensity and is an objective 
indicator of average image quality. Its value is proportional to the image quality, and a higher CNR value 
indicates a better recognition of target defects. AG is sensitive to the ability of an image to express small details 
in contrast. In a certain direction of an image, AG tends to be larger with a greater change in gray level. Thus, 
AG could be used to measure the clarity of an image and reflects small detail contrasts and texture 
transformation features in an image. JE reflects the amount of average information in the image and represents 
the aggregation characteristics of the image grayscale distribution. The greater the entropy of the image, the 


richer the pixel grayscale contained in the image, and the more uniform the grayscale distribution. 


The results of the reconstructed image quality assessment of the model used in this article are shown in Table 
5. Based on the spatial characteristics of the model, this study presented the results of a quality assessment of 
the reconstructed images of the 100", 150", and 200" layers of the test model. Reconstructed slicing images 
of the three layers depict the representation of hollow defects and solid aluminum. The quality of the 
reconstructed images varied considerably for different filters and the number of slicing layers in the 


reconstructed images. 


First, the reconstructed slicing images using the Ram-Lak filter had better quality than those using the Shepp- 
Logan or Hamming filters. Second, there was a high degree of similarity between the metric values of the 
100“ and 200" layers because of the consistent physical structures of the two slices in the model, such as the 
density, material, and shape of the defect. Moreover, the CNRs of the 100" and 200" slices are smaller than 
that of the 150™ slice, mainly because the defect area in the 150" slice of the image is smaller than those of 
100" and 150" slices. This demonstrates that the 3-D reconstruction method based on parallel computing has 
good recognition of small defects. Furthermore, the AG and IE of the 100™ and 200" slices were larger than 
those of the 150™ because of the greater change in the grayscale value of the former. It can be concluded that 
the images of the 100™ and 200" slices showed more detail in contrast. In general, the value of the three 
metrics meets the desired expectation for the designated model. Fig. 7 shows the reconstructed 2-D images 
using the three aforementioned filters. It is evident that the toroidal artifacts are more prominent in the images 
reconstructed with the Shepp-Logan and Hamming filters compared to those reconstructed with the Ram-Lak 
filter. 


Table 5 Metrics of reconstructed images 


Test Mode Filter CNR AG IE 

100" Ram-Lak 8.65 0.038840 0.73650 
Shepp-Logan 8.22 0.041788 0.72003 

Hamming 2.86 0.013373 0.71352 

150% Ram-Lak 9.03 0.031075 0.38491 
Shepp-Logan 8.11 0.033115 0.37352 

Hamming 2.35 0.013373 0.37053 

200° Ram-Lak 8.34 0.038681 0.74198 
Shepp-Logan 7.86 0.043203 0.72891 

Hamming 2.75 0.013372 0.71265 


CNR = eL 8? = E{(|si|? — u) Y, 62 = E{(|so|? — uo)}*, ui and wo are the average pixel value of the region of the 
8i +59" 


interest (ROI) and background, respectively, and 6; and 6, are the variances of ROI and background, respectively. AG = 


are the differences of the 


1 5M 5N aaay (ay ! ae 
ent Èj=1 (2 + D)? whereas M XN is the pixel size of the image. 


af ij) af ij) 
ao and ETH 
pixel value in the x and y directions, respectively. IE=— X425 P, X log2P,, Lis the overall grayscale level of the image, and P} is 


the probability of gray level g. 


To further discuss the quality of the 3-D-reconstructed images, we compared the defect situation of the model 
in the ideal case with that of the reconstructed slicing image sets. As the test model was accurately modeled 
using Geant4, this study utilized the parameter table as an inspection standard for the set of 300 reconstructed 


slicing images. 


Among the 300 images were some blank images because the size of the flat-panel detector was larger than 
that of the test model. During this process, the X-ray particles were captured by a flat-panel detector without 
decay. In this case, after the data of the associated units on the flat-panel detector were processed, the 
corresponding pixels appeared blank. 


In this process, excluding 100 blank images, the model was divided into 200 slices. A total of 148 slices 
contained small square and cylindrical defects, whereas 52 slices contained only cylindrical defects. The ratio 
of layers containing the square defect and non-blank layers (11) is 0.74:1, whereas the corresponding ratio (R1) 
according to parameter Table 1 is 0.75:1. The ratio of the length of the side of the small square defect to the 
length of the side of the large square (r2) in Fig. 7 is 0.37489:1, whereas the corresponding rlEatio (R2) 
according to parameter Table 1 is 0.375:1. The ratio of the diameter of the round defect to the length of the 
side of the large square is 0.0749:1 (73), whereas the corresponding ratio according to parameter Table 1 is 
0.075:1 (R3). The difference between the three ratios rı, r2, and r3 is within the acceptable error range. 


Examining the images of the remaining layers showed that the results were similar. 


Table 6 Proportional compatibility of the test model 
Ratio Ri Tı R2 T2 R3 T3 


Value 0.75:1 0.7485:1 0.375:1  0.37489:1  0.075:1 0.0749:1 


Hamming Filter 


Shepp-Logan Filter 


Ram-Lak Filter 
(a) 100% (b) 150" (c) 200% 


Fig. 7 Representative reconstructed slicing images 
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Fig. 8 illustrates the grayscale value variation curves of the same row of pixels of the two layers of the 
reconstructed slicing images. These two images were selected according to the parameter table whose defect 
situation was theoretically consistent, such as two adjacent layers. Owing to the hollow defects inside the 
model, the reconstructed sliced images exhibited obvious differences in the grayscale values for specific row 
pixels. From Fig. 8, it can be concluded that the grayscale value varied between [0,1] with the appearance of 
defects in the image, and that of the reconstructed slicing image coincided closely with the adjacent layer of 
the reconstructed slicing image. The RMSE values of the three sets of pixel data were 0.000378, 0.000230, 
and 0.000496, indicating an outstanding match within each set of pixel data. 


To obtain the complete 3-D reconstruction view shown in Fig. 9, a 3-D model was formed by stacking 180 2- 
D slice images in the correct order in 3-D space. However, inconsistencies in the images can result in a blurred 
appearance of the final 3-D model. To address this issue, a smooth filtering operation was applied to obtain a 
highly restored original model. The reconstructed surface geometry had almost no holes, indicating a high 
level of accuracy in the 3-D reconstruction process. It can be seen that the reconstructed model reproduced 


the defect features and external shapes modeled in Geant4 very well. The disadvantage of this method is that 
some irregularities exist on the surface, which may be caused by incomplete elimination of toroidal artifacts. 
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Fig. 8 Grayscale value variation curves 
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Fig. 9 Reconstructed image of 3-D view of the test model 


6. Application on a complex model 


The complex model is a plug used in the rock layer probe, which is made of aluminum and exported to Geant4 


using the software developed in-house, GMAC [38]. First, the complex model was imported from computer- 
aided design (CAD) software to GMAC, after which its materials were defined and filled with the GMAC. 
Next, the center coordinates were set at the center of the Geant4 world coordinate system. Subsequently, it 
was converted into a high-fidelity model and imported into the Geant4 simulation system shown in Fig. 10. 
The location of the X-ray source, material of the flat-panel detector, and manner in which the X-ray particles 
were emitted and collected were consistent with the above model, but the size of the detector was 30 cm x 
30 cm. To obtain projection images from 180 angles, the imaging simulation was performed 180 times; each 
time, the model was rotated by 1° and performed with hybrid G4-MT and G4-MPI with 320 threads. During 
each simulation, 10 billion X-ray particles with an energy of 150 keV were emitted, and the resulting 
projection image had a resolution of 600 x 600 pixels. In the parallel simulation process, the runtime was 
shortened largely as the average run time of each simulation is 313 s. Moreover, the run time in the non- 
parallel computing mode is 46,073 s, whereas the parallel speedup can be up to 147.2 while parallel efficiency 
is approximately 46%. A total of 180 original projection images were utilized to generate 2-D reconstructed 
images using the FBP algorithm with a Ram-Lak filter. The model in CAD is shown in Fig. 11(a) and 2-D 
reconstructed images of the 150th, 300th, and 450th slices are shown in Fig. 11(b), (c), and (d), respectively, 
for evaluation, as these three slices are representative based on observations that divide the model into four 
equal parts. From Fig. 11, it is evident that the visual characteristics of the three images are consistent with 
the corresponding slices of the complex model. For example, the 150th slice in Fig. 11(b) was composed of 
two separate ellipse-like shanks, as shown in Fig. 11(a). Three metrics, CNR, AG, and JE, are displayed in 
Table 7 the values of the three metrics indicate that the reconstructed images are of good quality. A 3-D 
reconstructed view of the model is generated and displayed in Fig. 12, which restores numerous details of the 


model information. 
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Fig. 10 Geant4 simulation system for the complex model 
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Fig. 11 (a) Shape of the model in CAD and illustration of which layer of the model these three images are (b), (c), and (d) 2-D 
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Fig. 12 3-D reconstructed view of the complex model 


Table 7 Metrics of three slicing images with Ram-Lak filter 


Complex Model CNR AG IE 
150% 4.13 7.2291 0.15260 
300" 4.48 5.7916 1.33991 
450" 2.93 8.6436 1.04886 


7. Conclusion 


This study proposes a parallel computing approach for simulation to 3-D reconstruct a test model. This was 
achieved by developing a parallel interface and DR simulation. The interface contributed to delivering 
information between Geant4-based parallel computing and the DR simulation system, thereby optimizing the 
use of computational resources. The application included a DR simulation system and function-calling scripts. 
The performances of the two parallel computing modes, G4-MPI and G4-MT, were compared by performing 
a DR simulation process with various combinations of threads and nodes. The results demonstrated that the 
speedup increased approximately linearly as the number of threads increased, whereas the parallel efficiency 
was maintained at over 94% for both modes. Hybrid G4-MPI and G4-MT have strong scalability and can 
significantly accelerate the Monte Carlo simulation, with a speedup of up to 151.36. The quality of the 


reconstructed images was good, and these images can reflect details of information of the defects inside the 


model accurately. Future work will focus on transforming complex parts into Geant4 models for imaging 


simulations by integrating the computer-aided design (CAD) function of radiation and the object geometry 


prior information in the X-ray reaction process. We also use a pre-calibrated depth-learning reconstruction 


algorithm for the projected image based on sparse view imaging to eliminate circular artifacts from the two- 


dimensional reconstructed slice image and obtain the best three-dimensional reconstruction result. 
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Appendix 1 


Scripts extract 4 Parallel computing mode of hybrid MT and MPI 


//Turn on multi-threaded function 

#ifdef GEMULTITHREADED 

G4MTRunManager* runManager = new G4MTRunManager; 
#else 

G4RunManager* runManager = new G4RunManager; 

#endif 


//Multi-threaded: save g4analysis objects to a merged histogram file 
if (true). { 


std::ostringstream fname; 


fname << “dose-rank”<< rank; 
HistoManager* Histo = HistoManager::GetAnalysis(); 


Histo->Save (fname.str ());} 


//Multi-nodes: merging of G4Run object, child nodes merge to master node 
RunMerger rm (static_cast < const Run*>(arun)); G4int ver = 0; 


rm.SetVerbosity(ver); rm.Merge(); 
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